卷积神经网络(Convnets或CNNS)已被坦率地部署在计算机视觉和相关领域的范围中。然而,这些神经网络的训练动态仍然难以捉摸:训练它们很难且计算昂贵。已经提出了无数的架构和培训策略来克服这一挑战,并解决了图像处理中的几个问题,例如语音,图像和动作识别以及对象检测。在本文中,我们提出了一种基于粒子群优化(PSO)的新型训练。在这样的框架中,每个转弯的权重向量通常被铸成一个粒子在相空间中的位置,从而使PSO协作动力学与随机梯度下降(SGD)交织在一起,以提高训练性能和泛化。我们的方法如下:i)[常规阶段]每个Convnet都通过SGD独立训练; ii)[协作阶段] convnets在当前的权重(或粒子位置)及其对损耗函数的梯度估计中共享。不同的台阶尺寸由不同的convnet创造。通过将较大(可能是随机)的阶梯尺寸以及更保守的阶梯尺寸正确混合,我们提出了一种具有竞争性能的算法,相对于CIFAR-10的其他基于PSO的方法(精度为98.31%)。这些准确性水平是通过仅诉诸四个Convnet来获得的 - 预计此类结果将随着协作交流的数量而扩展。我们使我们的源代码可用于下载https://github.com/leonlha/pso-convnet-dynamics。
translated by 谷歌翻译
Video understanding is a growing field and a subject of intense research, which includes many interesting tasks to understanding both spatial and temporal information, e.g., action detection, action recognition, video captioning, video retrieval. One of the most challenging problems in video understanding is dealing with feature extraction, i.e. extract contextual visual representation from given untrimmed video due to the long and complicated temporal structure of unconstrained videos. Different from existing approaches, which apply a pre-trained backbone network as a black-box to extract visual representation, our approach aims to extract the most contextual information with an explainable mechanism. As we observed, humans typically perceive a video through the interactions between three main factors, i.e., the actors, the relevant objects, and the surrounding environment. Therefore, it is very crucial to design a contextual explainable video representation extraction that can capture each of such factors and model the relationships between them. In this paper, we discuss approaches, that incorporate the human perception process into modeling actors, objects, and the environment. We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding. Source code is publicly available at https://github.com/UARK-AICV/Video_Representation.
translated by 谷歌翻译
The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation.
translated by 谷歌翻译
Event Detection (ED) is the task of identifying and classifying trigger words of event mentions in text. Despite considerable research efforts in recent years for English text, the task of ED in other languages has been significantly less explored. Switching to non-English languages, important research questions for ED include how well existing ED models perform on different languages, how challenging ED is in other languages, and how well ED knowledge and annotation can be transferred across languages. To answer those questions, it is crucial to obtain multilingual ED datasets that provide consistent event annotation for multiple languages. There exist some multilingual ED datasets; however, they tend to cover a handful of languages and mainly focus on popular ones. Many languages are not covered in existing multilingual ED datasets. In addition, the current datasets are often small and not accessible to the public. To overcome those shortcomings, we introduce a new large-scale multilingual dataset for ED (called MINION) that consistently annotates events for 8 different languages; 5 of them have not been supported by existing multilingual datasets. We also perform extensive experiments and analysis to demonstrate the challenges and transferability of ED across languages in MINION that in all call for more research effort in this area.
translated by 谷歌翻译
Event Extraction (EE) is one of the fundamental tasks in Information Extraction (IE) that aims to recognize event mentions and their arguments (i.e., participants) from text. Due to its importance, extensive methods and resources have been developed for Event Extraction. However, one limitation of current research for EE involves the under-exploration for non-English languages in which the lack of high-quality multilingual EE datasets for model training and evaluation has been the main hindrance. To address this limitation, we propose a novel Multilingual Event Extraction dataset (MEE) that provides annotation for more than 50K event mentions in 8 typologically different languages. MEE comprehensively annotates data for entity mentions, event triggers and event arguments. We conduct extensive experiments on the proposed dataset to reveal challenges and opportunities for multilingual EE.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Predictive simulations of the shock-to-detonation transition (SDT) in heterogeneous energetic materials (EM) are vital to the design and control of their energy release and sensitivity. Due to the complexity of the thermo-mechanics of EM during the SDT, both macro-scale response and sub-grid mesoscale energy localization must be captured accurately. This work proposes an efficient and accurate multiscale framework for SDT simulations of EM. We employ deep learning to model the mesoscale energy localization of shock-initiated EM microstructures upon which prediction results are used to supply reaction progress rate information to the macroscale SDT simulation. The proposed multiscale modeling framework is divided into two stages. First, a physics-aware recurrent convolutional neural network (PARC) is used to model the mesoscale energy localization of shock-initiated heterogeneous EM microstructures. PARC is trained using direct numerical simulations (DNS) of hotspot ignition and growth within microstructures of pressed HMX material subjected to different input shock strengths. After training, PARC is employed to supply hotspot ignition and growth rates for macroscale SDT simulations. We show that PARC can play the role of a surrogate model in a multiscale simulation framework, while drastically reducing the computation cost and providing improved representations of the sub-grid physics. The proposed multiscale modeling approach will provide a new tool for material scientists in designing high-performance and safer energetic materials.
translated by 谷歌翻译
Robots have been brought to work close to humans in many scenarios. For coexistence and collaboration, robots should be safe and pleasant for humans to interact with. To this end, the robots could be both physically soft with multimodal sensing/perception, so that the robots could have better awareness of the surrounding environment, as well as to respond properly to humans' action/intention. This paper introduces a novel soft robotic link, named ProTac, that possesses multiple sensing modes: tactile and proximity sensing, based on computer vision and a functional material. These modalities come from a layered structure of a soft transparent silicon skin, a polymer dispersed liquid crystal (PDLC) film, and reflective markers. Here, the PDLC film can switch actively between the opaque and the transparent state, from which the tactile sensing and proximity sensing can be obtained by using cameras solely built inside the ProTac link. In this paper, inference algorithms for tactile proximity perception are introduced. Evaluation results of two sensing modalities demonstrated that, with a simple activation strategy, ProTac link could effectively perceive useful information from both approaching and in-contact obstacles. The proposed sensing device is expected to bring in ultimate solutions for design of robots with softness, whole-body and multimodal sensing, and safety control strategies.
translated by 谷歌翻译
流视频是创作者与观众分享创意作品的方法之一。在这些视频中,流媒体分享了如何通过在一个或几个用于创意项目的程序中使用各种工具来实现最终目标。为此,可以讨论实现最终目标所需的步骤。因此,这些视频可以提供大量的教育内容,这些内容可用于学习如何使用流媒体使用的工具。但是,缺点之一是,流媒体可能无法为每个步骤提供足够的详细信息。因此,对于学习者来说,可能很难赶上所有步骤。为了减轻此问题,一种解决方案是将流视频与流视频中使用的工具可用的相关教程联系起来。更具体地说,系统可以分析实时流媒体视频的内容,并推荐最相关的教程。由于现有的文档推荐模型无法处理这种情况,因此在这项工作中,我们为实时流程视频的教程建议提供了一个新颖的数据集和模型。我们对拟议的数据集和模型进行了广泛的分析,揭示了该任务的挑战性质。
translated by 谷歌翻译
键形提取是NLP中文档理解的重要任务之一。虽然大多数先前的作品都致力于正式设置,例如书籍,新闻或网络博客,但探索视频成绩单等非正式文本的探索较少。为了解决这一局限性,在这项工作中,我们提出了一种新颖的语料库和方法,用于从Behance平台上流的视频的成绩单中提取钥匙短语。更具体地说,在这项工作中,提出了一种新型的数据增强,以通过从其他域中提取键形提取任务的背景知识来丰富模型。提出的数据集数据集上的广泛实验显示了引入方法的有效性。
translated by 谷歌翻译